Skip to content

refactor: framework de-coupling — generic runtime, ASR as use case (v1.1)#2

Merged
aksOps merged 15 commits into
mainfrom
refactor/framework-decoupling
May 7, 2026
Merged

refactor: framework de-coupling — generic runtime, ASR as use case (v1.1)#2
aksOps merged 15 commits into
mainfrom
refactor/framework-decoupling

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 7, 2026

Summary

v1.0 (PR #1) made workflow correctness structural — typed tools, schema-validated boundaries, per-session locks. But the runtime itself still hardcoded ASR-incident-management vocabulary: _TERMINAL_TOOL_RULES knew mark_resolved/mark_escalated/notify_oncall by name, runtime/mcp_servers/observability.py lived in the framework dir, Session.id_format returned INC-*. Smoking gun: examples/code_review/ was structurally broken — every code_review session would land at needs_review because the framework didn't recognize its terminal tools.

This PR makes the runtime structurally framework-generic. ASR is now a use case, not the framework.

What changed (4 phases, 12 commits, +3.4k LOC, 927 tests passing)

Phase Concern Fix
5 (DECOUPLE-01) Hardcoded session-id prefix FrameworkAppConfig.session_id_prefix (1-16 chars, validated). Apps configure: INC / REVIEW / SES. 16 boundary tests.
6 (DECOUPLE-02, 03, 06) Tool names + status vocab in src/runtime/ App-registered OrchestratorConfig.terminal_tools + statuses + default_terminal_status. Generic extract_fields for team/severity/etc. metadata capture. Parameterized escalate path (escalate_action_tool_name + escalate_action_default_team). Concept-leak ratchet test enforces zero incident-vocabulary in framework.
7 (DECOUPLE-04) App MCP servers in framework dir runtime/mcp_servers/{observability,remediation,user_context}.pyexamples/incident_management/mcp_servers/. Framework discovers via OrchestratorConfig.mcp_servers: list[str] dotted paths. Apps own their config-binding contracts (register(mcp_app, cfg)). 80 import sites flipped.
8 (DECOUPLE-05, 07) State schema + e2e validation OrchestratorConfig.state_overrides_schema (dotted-path pydantic class). IncidentStateOverrides + CodeReviewStateOverrides boundary tests. tests/test_code_review_e2e.py walks an actual code_review session and asserts terminal status from code_review's vocabulary, not needs_review. Generic TerminalToolRule.match_args discriminator added to support code_review's set_recommendation(recommendation=approve|request_changes|comment) single-tool dispatch. Binary ratchet — RATCHET_ALLOWLIST is now empty.

Tests

  • 927 passing, 3 skipped (was 869 at v1.0; +58 net new tests)
  • Binary concept-leak ratchet: git grep -E 'mark_(resolved|escalated)|notify_oncall|submit_hypothesis|update_incident|apply_fix' src/runtime/ returns zero matches
  • code_review e2e: 6/6 GREEN — approve / request_changes / comment dispatched correctly via match_args; sys.modules snapshot diff confirms zero incident_management* imports leaked into code_review session
  • All 4 dist bundles regenerated and AST-valid

What the framework no longer knows

  • Hardcoded session-id format (INC-*)
  • Tool names (mark_resolved, mark_escalated, notify_oncall, submit_hypothesis, update_incident, apply_fix)
  • Status vocabulary (escalated, resolved, needs_review — all app-declared now)
  • "Team" / "escalation_teams" / "environment" semantics — apps own them via extract_fields
  • App MCP server module paths — apps declare via OrchestratorConfig.mcp_servers
  • State-overrides shape — apps register pydantic schemas

These are now structural guarantees enforced by the binary leak ratchet.

Migration impact

  • incident_management: YAML config now declares terminal_tools, statuses, default_terminal_status, escalate_action_tool_name, mcp_servers, state_overrides_schema. Existing skills, MCP server, and runtime behavior unchanged. End-to-end test suite passes.
  • code_review: Now structurally functional. YAML declares its own terminal_tools (3 rules dispatching set_recommendation via match_args), statuses (approved/changes_requested/commented/unreviewed), state_overrides_schema. e2e test verifies the genericity invariant.
  • Future apps: Drop a YAML config + a skills/ dir + an MCP server, declare the registry blocks, ship. No framework changes needed.

Known follow-ups (deferred to v1.2 Hardening)

  • HARD-01..09: LLM HTTP timeouts, dependency lockfile, pyright gate flip, bare-except cleanup, hardcoded ollama.com fallback URL, ApprovalWatchdog leak, singleton thread-safety, dist staleness CI gate, ui.py zero-coverage. All tracked in .planning/REQUIREMENTS.md v2 section.

Test plan

  • Full pytest: 927 passed, 3 skipped
  • Binary ratchet GREEN
  • dist/* regenerated (4 files: app.py, ui.py, apps/incident-management.py, apps/code-review.py)
  • code_review e2e GREEN (6/6 cases including cross-app state-override rejection)
  • ruff clean on touched files
  • Manual smoke: start a fresh code_review session in the Streamlit UI, walk through review → recommendation → verdict
  • Manual smoke: incident_management end-to-end via real LLM (verify no regression)

🤖 Generated with Claude Code

aksOps and others added 15 commits May 6, 2026 15:26
Replace hardcoded INC-YYYYMMDD-NNN session-id format with config-driven
prefix. Apps declare their prefix via FrameworkAppConfig.session_id_prefix
(default "SES", validated 1-16 chars, alphanumeric + hyphens). The
prefix threads through SessionStore -> Session.id_format -> repository.

Configs:
  - incident_management.yaml: INC
  - code_review.runtime.yaml:  REVIEW
  - config.yaml (default):     SES (also explicitly INC for asr-default)

Validation rejects empty / whitespace / symbols / underscore / >16-chars
at config-load time via field_validator.

First step toward v1.1 framework de-coupling — runtime no longer hardcodes
the incident-management session-id convention.

Tests: 869 passed, 3 skipped (16 new parametrized tests in
tests/test_session_id_prefix.py covering default, custom prefixes
INC/REVIEW/HR/MY-APP/16-char, and invalid inputs).

dist/* bundles regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New module src/runtime/terminal_tools.py introduces the type system
for the generic terminal-tool registry that replaces the hardcoded
_TERMINAL_TOOL_RULES table in orchestrator.py and the
_TYPED_TERMINAL_TOOLS frozenset in graph.py (consumed in Wave 2).

- TerminalToolRule: tool_name + status + extract_fields dict, all
  extra="forbid" to reject typos in app YAML at boot (D-06-01).
- extract_fields generalises the v1.0 _extract_team(team_keys)
  positional-tuple lookup into a {dest: [args.X|result.X]} dict so
  apps name fields whatever fits (team, severity, reviewer) (D-06-02).
- StatusDef: name + terminal + kind (Literal of 5 categories), no
  color or label (UI presentation owned by UIConfig.badges per
  D-06-05 rejected alternative).

The new module is vocabulary-free: zero references to
mark_resolved/mark_escalated/notify_oncall/etc, satisfying
DECOUPLE-02 for the framework side.
Add three new fields and a cross-field model_validator to
OrchestratorConfig so apps can declare their terminal-tool rules
and status vocabulary in YAML without writing Python (D-06-03).

Fields:
- terminal_tools: list[TerminalToolRule] (D-06-01)
- statuses: dict[str, StatusDef] (D-06-05)
- default_terminal_status: str | None (D-06-06)

Cross-field invariants enforced at config-load (raises
pydantic.ValidationError):
- statuses non-empty => default_terminal_status required
- default_terminal_status must reference a declared status
- referenced default status must have terminal=True
- every terminal_tools[i].status must reference a declared status
- empty statuses with non-empty terminal_tools or set
  default_terminal_status is rejected (config mistake)

Bare OrchestratorConfig() still constructs with empty registry —
the framework default for unconfigured apps stays valid. Also
tightens model_config = {"extra": "forbid"} on OrchestratorConfig
so unknown YAML keys fail loudly at boot, matching the
extra="forbid" pattern used by sibling configs in this file.

Wave 2 (06-02) will consume these fields in
_finalize_session_status / _harvest_typed_terminal and ship the
same-PR incident_management.yaml registration (D-06-04).
tests/test_status_vocabulary.py exercises the new
OrchestratorConfig.{terminal_tools,statuses,default_terminal_status}
plus the TerminalToolRule / StatusDef pydantic models. Covers
DECOUPLE-03 acceptance: framework rejects unknown statuses and
unknown defaults at config-load, not at gateway-eval time.

Test coverage:
- TerminalToolRule shape: minimal + extract_fields round-trip +
  extra="forbid" rejects unknown keys (D-06-01, D-06-02)
- StatusDef shape: minimal + Literal kind enforcement +
  extra="forbid" rejects color (D-06-05 — UI presentation leak)
- OrchestratorConfig happy path with mixed rules and statuses
- bare OrchestratorConfig() still constructs (framework default
  invariant)
- default_terminal_status without statuses rejected
- terminal_tools without statuses rejected
- default required when statuses non-empty
- default must reference a declared status (error names valid keys)
- default must be terminal=True (in_progress fails)
- terminal_tools[i].status must reference a declared status (error
  names the index and the offending value, both for index 0 and
  later indices)
- extra="forbid" on OrchestratorConfig survives the new fields

All 16 tests pass; full suite at 885 passed (was 869).
Two new test files; both fail until Tasks 2.2/2.3/2.4 land:

- tests/test_concept_leak_ratchet.py — DECOUPLE-06 binary ratchet
  walking src/runtime/ for the 6 forbidden tokens (mark_resolved,
  mark_escalated, notify_oncall, submit_hypothesis, update_incident,
  apply_fix). Currently fails because _TERMINAL_TOOL_RULES still
  lives in orchestrator.py and _TYPED_TERMINAL_TOOLS in graph.py.

- tests/test_terminal_tool_registry.py — 9 finalize-path integration
  tests covering both incident_management and a synthetic
  code_review-style registration. Currently fails because
  Orchestrator._infer_terminal_decision / _extract_field are not
  yet instance methods (D-06-08).

RED phase commit per the v1.1 milestone TDD discipline. GREEN
follows in the atomic framework-migration commit.
…rized escalate (DECOUPLE-02, DECOUPLE-03, DECOUPLE-06)

Migrates `_finalize_session_status` and the typed-terminal harvester to
read app-registered rules off `OrchestratorConfig` instead of the v1.0
hardcoded `_TERMINAL_TOOL_RULES` / `_TYPED_TERMINAL_TOOLS` tables. Ships
the framework refactor and the matching incident_management YAML
registration in a single atomic commit per D-06-04.

Decisions cited:
- D-06-02: extract_fields lookup syntax preserved (args.X / result.X).
- D-06-03: registry lives in app YAML; pydantic-validated at boot.
- D-06-04: same-PR atomic commit — incident_management end-to-end
  flow stays green throughout.
- D-06-05: StatusDef.kind drives the escalated_to mirror dispatch
  rather than hardcoded vocabulary.
- D-06-06: default_terminal_status is app-owned (incident_management
  uses needs_review; code_review uses unreviewed).
- D-06-08: `_finalize_session_status` reads `self.cfg.orchestrator`
  directly via instance method shape.

Resolutions applied:
- Resolution A (escalate path) — Option B: parameterised on the new
  `OrchestratorConfig.escalate_action_tool_name` and
  `escalate_action_default_team` fields. The orchestrator's
  `resume_session(action="escalate")` looks up the matching rule in
  `terminal_tools`, drives status assignment from `rule.status`, and
  pulls extra-field destination key from `rule.extract_fields`.
  Hardcoded `notify_oncall` / `platform-oncall` / `escalated_to`
  literals removed from `orchestrator.py`.
- Resolution B (ratchet scope) — Option 3: `RATCHET_ALLOWLIST`
  carries 8 entries with phase-handles (`Phase 7` for
  `mcp_servers/remediation.py`; `Phase 8` for `terminal_tools.py`
  docstrings + `storage/event_log.py` docstrings). Companion
  meta-test asserts each entry still matches a real line — when
  Phase 7 / 8 ship and the offending file is gone, the meta-test
  fails and forces stale-entry deletion. Allowlist is shrink-only.

New OrchestratorConfig fields:
- `patch_tools: list[str]` — tools whose `args.patch` blob the
  harvester folds into agent confidence/signal/rationale. Avoids a
  third hardcoded `update_incident` leak by routing the recognition
  through YAML.
- `harvest_terminal_tools: list[str]` — tools the harvester treats
  as typed-terminal for confidence capture but the orchestrator's
  finalize path does NOT transition status on. Used for
  `submit_hypothesis`-style stage-complete signals.
- `escalate_action_tool_name: str | None` — MCP tool the
  orchestrator invokes when a user clicks Escalate. None means
  no side-effect.
- `escalate_action_default_team: str | None` — fallback team when
  the user did not pick one.

Same-PR YAML registration shipped in:
- config/incident_management.yaml
- config/config.yaml
- config/config.yaml.example
- config/code_review.runtime.yaml (placeholder, full migration in
  Phase 8 per D-06-04)

Test fixtures updated to provide `cfg` to the finalize-only stub
orchestrators (test_finalize_status_inference.py, test_session_lock.py,
test_finalize_concurrent.py) and to pass `terminal_tool_names` /
`patch_tool_names` to `_harvest_tool_calls_and_patches` /
`make_agent_node` (test_harvester_typed.py, test_agent_node.py,
test_gate.py). The `cfg` fixture in test_resume.py now mirrors the
incident_management YAML registration so the parameterised escalate
path flows through unchanged behaviour for legacy callers.

Verification:
- pytest: 899 passed (vs 870 baseline pre-Phase 6).
- ratchet `git grep` — every match is allowlisted and explained.
- ruff: clean on every touched file.
- All three production YAMLs (config.yaml,
  config.yaml.example, code_review.runtime.yaml) validate against
  the new OrchestratorConfig schema.

dist/ bundle regen deliberately deferred to Plan 06-03.
… close-out

Phase 6 close-out (Generic Terminal-Tool Registry + Status Vocabulary).
Final wave: regenerated dist/app.py, dist/apps/incident-management.py,
dist/apps/code-review.py from the post-Phase-6 src/runtime/ + examples/
tree. dist/ui.py regenerated to byte-identical content (config-driven UI
shell already generic; no v1.1 surface changes). The bundles now embed
TerminalToolRule + StatusDef pydantic models, drop the v1.0
_TERMINAL_TOOL_RULES table and _TYPED_TERMINAL_TOOLS frozenset, and
carry per-rule extract_fields generalising the v1.0 _extract_team
lookup.

Phase 6 final acceptance:
  - git grep leak-ratchet: 9 matches, all allowlisted (2 Phase 7,
    7 Phase 8) per RATCHET_ALLOWLIST in test_concept_leak_ratchet.py
  - pytest -q: 899 passed (Phase 5 baseline 873; +28 from Phase 6:
    +14 status_vocabulary, +9 terminal_tool_registry, +5 ratchet)
  - bundle smoke tests: 12 passed
  - ruff src/runtime/ tests/ examples/: clean

Phase 6 totals: 3 plans, 11 tasks, 6 commits across 3 waves
(8a249f3, 700b4e1, d7e8b26, 1b43017, 070e15d, this commit).
Closes DECOUPLE-02, DECOUPLE-03, DECOUPLE-06.

STATE.md and 06-03-SUMMARY.md document the full hand-off to Phase 7
(DECOUPLE-04) + Phase 8 (DECOUPLE-05 + 07) — both files are gitignored
per project policy and NOT included in this commit.

Ready for /gsd-transition to Phase 7.
…covery (DECOUPLE-04)

Phase 7 / DECOUPLE-04 — single atomic commit per D-07-04. Framework no
longer hardcodes incident-vocabulary MCP servers; apps declare their
per-tool servers via a generic configurable dotted-path list.

* D-07-01 — moved (git mv, history preserved):
    src/runtime/mcp_servers/{observability,remediation,user_context,__init__}.py
      -> examples/incident_management/mcp_servers/

* D-07-02 — new field OrchestratorConfig.mcp_servers: list[str]
  (src/runtime/config.py). YAML wiring:
    config/config.yaml + config/incident_management.yaml + config.yaml.example
      declare orchestrator.mcp_servers with the 3 dotted paths under
      examples.incident_management.mcp_servers.*
    config/code_review.runtime.yaml declares orchestrator.mcp_servers: []
      — empty-list branch verified clean (dist/apps/code-review.py
      contains zero examples.incident_management.mcp_servers refs).

* D-07-03 — converted setters to register(mcp_app, cfg) contract:
    observability.set_environments  -> register(mcp_app, cfg) closing over
      cfg.environments via _make_environment_validator (snapshotted; no
      module-level mutable list).
    remediation.set_escalation_teams -> register(mcp_app, cfg) closing
      over cfg.framework.escalation_teams (or cfg.escalation_teams
      fallback) into a module-level snapshot tuple.
    user_context -> no-op register(mcp_app, cfg) for contract uniformity.

  Chosen contract: two-arg register(mcp_app, cfg). The orchestrator
  passes mcp_app=None — modules expose their own per-module FastMCP
  instance composed by the loader; mcp_app exists for future
  composition needs.

* orchestrator.py:357-365 (now ~365-380): replaced the silent-swallow
  try/except setter calls with a single importlib-driven loop:

      for module_path in cfg.orchestrator.mcp_servers:
          mod = importlib.import_module(module_path)
          reg = getattr(mod, "register", None)
          if reg is None:
              raise RuntimeError(
                  f"orchestrator.mcp_servers entry {module_path!r} does "
                  f"not expose a `register(mcp_app, cfg)` callable"
              )
          reg(None, cfg)

  No more silent except — missing register raises explicitly per the
  must_haves contract.

* Bundler patched: scripts/build_single_file.py
  RUNTIME_MODULE_ORDER:66-68 (the three hardcoded mcp_servers paths)
  removed from the framework-only block; the equivalent entries added
  to INCIDENT_APP_MODULE_ORDER (under EXAMPLES_ROOT) so the per-tool
  modules ship in dist/apps/incident-management.py without leaking
  into dist/app.py (framework-only) or dist/apps/code-review.py.

* Bundles regenerated: dist/app.py, dist/ui.py,
  dist/apps/incident-management.py, dist/apps/code-review.py.

* Import-site flip: ~60 sites across 19 test files + 4 config YAMLs
  changed `runtime.mcp_servers.*` -> `examples.incident_management.mcp_servers.*`.
  `git grep -E 'runtime\\.mcp_servers|runtime/mcp_servers' -- src/ tests/
  apps/ scripts/ examples/ config/` returns zero hits.

* Ratchet allowlist (tests/test_concept_leak_ratchet.py): removed the
  two `src/runtime/mcp_servers/remediation.py` entries (`apply_fix`,
  `notify_oncall`) and the explanatory Phase-7 comment. The
  `test_allowlist_entries_actually_match` meta-test stays GREEN
  because the file no longer exists under that path — leaving the
  entries would have failed the meta-test (stale allowlist).

* Test contract migration: tests/test_notify_oncall_team_required.py
  was importing the now-removed `set_escalation_teams` directly. It
  now binds the roster via the public `register(mcp_app, cfg)`
  contract with a SimpleNamespace cfg — same observable behavior,
  same three test cases. tests/test_resume.py cfg fixture: added
  `mcp_servers=[...]` to OrchestratorConfig so the orchestrator's new
  discovery loop binds the escalation roster (otherwise the snapshot
  would carry over from prior tests, a fixture-completeness bug
  exposed by the contract change — Rule 3).

Verification:
  * git ls-files src/runtime/mcp_servers/        -> empty
  * git ls-files examples/incident_management/mcp_servers/
      -> 4 files (__init__, observability, remediation, user_context)
  * full pytest suite: 899 passed (baseline preserved)
  * ruff check (touched files): All checks passed
  * dist/apps/incident-management.py: contains the 3 relocated MCP
    server bodies (verified via grep for FastMCP("observability") etc.)
  * dist/apps/code-review.py: contains zero
    examples.incident_management.mcp_servers references — D-07-02
    empty-list branch GREEN.

Closes DECOUPLE-04 from REQUIREMENTS.md.
…t (DECOUPLE-05, DECOUPLE-07) — v1.1 milestone close

Closes milestone v1.1 (Framework De-coupling). Single atomic commit per
D-08-04 covering DECOUPLE-05 + DECOUPLE-07 + ratchet binarization.

DECOUPLE-05 (D-08-01, D-08-02):
* OrchestratorConfig.state_overrides_schema: str | None — dotted path
  to a pydantic BaseModel subclass (`mod.path:Class` or `mod.path.Class`).
* importlib resolution at Orchestrator.create(); bad path raises
  RuntimeError with the offending path AND class name. Non-BaseModel
  targets are caught explicitly.
* start_session(state_overrides=…) runs cls.model_validate(...) after
  _coerce_state_overrides and before store.create. None skips
  validation entirely — D-08-02 backward compatibility.
* New IncidentStateOverrides (environment, severity) and
  CodeReviewStateOverrides (pr_url, repo, base_branch, pr_number).
  Both extra='forbid' so typos surface at session-start, not at
  gateway-eval (PVC-06 generalization).
* All three runtime YAMLs declare state_overrides_schema.

DECOUPLE-07 (D-08-03):
* TerminalToolRule.match_args: dict[str, str] = {} — NEW optional
  argument-value discriminator. Empty default preserves v1.0 single-
  rule shape; non-empty restricts the rule to tool calls whose
  args[k] == v for every declared key. Generic feature any app can
  use (priority, severity, recommendation, …).
* code_review.runtime.yaml: 5-status vocab + 3 set_recommendation
  rules dispatching by args.recommendation -> approved /
  changes_requested / commented; default_terminal_status: unreviewed.
* tests/test_code_review_e2e.py (6 tests): real Orchestrator from
  YAML + synthesized set_recommendation ToolCall + finalize hook.
  Asserts code_review-vocabulary status (NOT unreviewed/needs_review),
  state_overrides validation rejects unknown keys + cross-app shapes,
  no incident_management modules in sys.modules after the test.

Concept-leak ratchet binary closure:
* Scrubbed mark_resolved/mark_escalated/notify_oncall/apply_fix from
  src/runtime/terminal_tools.py:19-20,45-47 and
  src/runtime/storage/event_log.py:3,37. Replacement uses neutral
  <terminal_tool> / set_recommendation placeholders.
* RATCHET_ALLOWLIST = {} (was 5 entries).
* git grep '\b(mark_(resolved|escalated)|notify_oncall|submit_hypothesis|update_incident|apply_fix)\b' src/runtime/ — zero matches.

Bundler:
* INCIDENT_APP_MODULE_ORDER and CODE_REVIEW_APP_MODULE_ORDER list
  their app's state.py first. dist/* regenerated; both schema
  classes embedded; ast.parse() OK.

Tests: 927 passed (was 899) — 22 new schema + 6 new e2e. ruff clean
on all touched files. RATCHET_ALLOWLIST: {}.

Closes DECOUPLE-05 and DECOUPLE-07 — 7/7 v1.1 DECOUPLE-* shipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_with_schema

CI was missing OLLAMA_API_KEY env var, causing _interpolate to KeyError when
loading config/config.yaml. Tests should not depend on real secrets — set a
placeholder via monkeypatch so the YAML parses without making real Ollama
calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ds_with_schema

Prior fix only set OLLAMA_API_KEY but config.yaml has 5 `${VAR}` refs:
OLLAMA_API_KEY, AZURE_ENDPOINT, AZURE_OPENAI_KEY, EXTERNAL_MCP_URL, EXT_TOKEN.
Set placeholders for all of them so _interpolate succeeds in CI without any
real secrets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
58.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@aksOps aksOps merged commit 0ff8914 into main May 7, 2026
7 of 8 checks passed
@aksOps aksOps deleted the refactor/framework-decoupling branch May 14, 2026 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant